The noise factor: irrelevant search results on the World Wide Web

نویسنده

  • Wendy T. Lucas
چکیده

Finding information on the World Wide Web is easy; finding relevant information is not. While search engines provide a more directed approach to resource discovery than browsing, the pages they identify as “matching” a query often have little relevance to the information being sought. To examine the relationship between query terms and the pages they match, the queries that were used to locate five Web pages were collected over a five-month period. The relevancy of page content to the query terms was then analyzed. Page content was judged as being irrelevant to more than one-third of the queries. Search engines disagreed; a test page link appeared within the first one percent of the total number of links retrieved in response to forty percent of those queries. This supported the supposition that conducting searches with popular search engines often results in too many links with little relevance to the query terms. An alternative approach is therefore proposed in which metadata, hyperlinks, and other subject-related HyperText Markup Language (HTML) tags are used to improve the effectiveness of Web queries. By relying on the structural components of HTML documents, it should be possible to conduct intelligent searches that yield more relevant results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

Hierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics

This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...

متن کامل

Cleaning Web Pages for Effective Web Content Mining

Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-based search engines and taxonomic web page categorization applications). Noise on web pages are irrelevant to the main content on the web pages being mined, and include advertisements, navigation bar, and copyright noti...

متن کامل

بررسی مدل ذهنی دانشجویان کارشناسی ارشد نسبت به موتور کاوش گوگل

The World Wide Web (WWW) is a major channel of getting information and using web search engines is the most popular way of accessing information. This study aims to investigate master students’ mental model completeness level of Google web search engine. From the methodological perspective, this research is a practical one based on survey method. The sample population consisted of 30 master stu...

متن کامل

Comparative Study on Semantic Search Engines

Current World Wide Web also recognized as Web 2.0 is an immense library of interlinked documents that are transferred by computers and presented to people. Search engine is considered the most important tool to discover any information from WWW. Inspite of having lots of development and novel research in current search engines techniques, they are still syntactic in nature and display search re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000